Tiffany Chan

Natural Language Processing Project

Sentiment Analysis Classification of Twitter Airline Tweets

1. Import the libraries, load dataset, print shape of data, data description (5 Marks).

In [178]:
#Importing necessary libraries.

import pandas as pd  
from bs4 import BeautifulSoup
import re
#lower_case = letters_only.lower() 
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression  
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import MultinomialNB
from sklearn.naive_bayes import GaussianNB
from sklearn import svm
from sklearn.model_selection import cross_val_score
import numpy as np
import pandas as pd

import nltk                          
nltk.download('punkt')
import contractions
import spacy
nlp = spacy.load('en_core_web_sm')
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split

import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint as sp_randint
In [179]:
#Load dataset
#Read csv file
tweet = pd.read_csv('Tweets.csv')
In [180]:
#Shape of data. Data dimensions
tweet.shape
Out[180]:
(14640, 15)

There are 14640 rows (tweets to analyze), and 15 columns (features).

In [640]:
#Descriptive statistics (Data description)
tweet.describe
Out[640]:
<bound method NDFrame.describe of                  tweet_id airline_sentiment  airline_sentiment_confidence  \
0      570306133677760513           neutral                        1.0000   
1      570301130888122368          positive                        0.3486   
2      570301083672813571           neutral                        0.6837   
3      570301031407624196          negative                        1.0000   
4      570300817074462722          negative                        1.0000   
...                   ...               ...                           ...   
14635  569587686496825344          positive                        0.3487   
14636  569587371693355008          negative                        1.0000   
14637  569587242672398336           neutral                        1.0000   
14638  569587188687634433          negative                        1.0000   
14639  569587140490866689           neutral                        0.6771   

               negativereason  negativereason_confidence         airline  \
0                         NaN                        NaN  Virgin America   
1                         NaN                     0.0000  Virgin America   
2                         NaN                        NaN  Virgin America   
3                  Bad Flight                     0.7033  Virgin America   
4                  Can't Tell                     1.0000  Virgin America   
...                       ...                        ...             ...   
14635                     NaN                     0.0000        American   
14636  Customer Service Issue                     1.0000        American   
14637                     NaN                        NaN        American   
14638  Customer Service Issue                     0.6659        American   
14639                     NaN                     0.0000        American   

      airline_sentiment_gold             name negativereason_gold  \
0                        NaN          cairdin                 NaN   
1                        NaN         jnardino                 NaN   
2                        NaN       yvonnalynn                 NaN   
3                        NaN         jnardino                 NaN   
4                        NaN         jnardino                 NaN   
...                      ...              ...                 ...   
14635                    NaN  KristenReenders                 NaN   
14636                    NaN         itsropes                 NaN   
14637                    NaN         sanyabun                 NaN   
14638                    NaN       SraJackson                 NaN   
14639                    NaN        daviddtwu                 NaN   

       retweet_count                                               text  \
0                  0                @VirginAmerica What @dhepburn said.   
1                  0  @VirginAmerica plus you've added commercials t...   
2                  0  @VirginAmerica I didn't today... Must mean I n...   
3                  0  @VirginAmerica it's really aggressive to blast...   
4                  0  @VirginAmerica and it's a really big bad thing...   
...              ...                                                ...   
14635              0  @AmericanAir thank you we got on a different f...   
14636              0  @AmericanAir leaving over 20 minutes Late Flig...   
14637              0  @AmericanAir Please bring American Airlines to...   
14638              0  @AmericanAir you have my money, you change my ...   
14639              0  @AmericanAir we have 8 ppl so we need 2 know h...   

      tweet_coord              tweet_created tweet_location  \
0             NaN  2015-02-24 11:35:52 -0800            NaN   
1             NaN  2015-02-24 11:15:59 -0800            NaN   
2             NaN  2015-02-24 11:15:48 -0800      Lets Play   
3             NaN  2015-02-24 11:15:36 -0800            NaN   
4             NaN  2015-02-24 11:14:45 -0800            NaN   
...           ...                        ...            ...   
14635         NaN  2015-02-22 12:01:01 -0800            NaN   
14636         NaN  2015-02-22 11:59:46 -0800          Texas   
14637         NaN  2015-02-22 11:59:15 -0800  Nigeria,lagos   
14638         NaN  2015-02-22 11:59:02 -0800     New Jersey   
14639         NaN  2015-02-22 11:58:51 -0800     dallas, TX   

                    user_timezone  
0      Eastern Time (US & Canada)  
1      Pacific Time (US & Canada)  
2      Central Time (US & Canada)  
3      Pacific Time (US & Canada)  
4      Pacific Time (US & Canada)  
...                           ...  
14635                         NaN  
14636                         NaN  
14637                         NaN  
14638  Eastern Time (US & Canada)  
14639                         NaN  

[14640 rows x 15 columns]>

Most of these features aren't really pertinent to this project. So, it is important to limit it to the 'text' and 'airline_sentiment'.

In [80]:
#Looking at the nature of each variable to see if we would need to change their status later.
tweet.dtype
Out[80]:
tweet_id                          int64
airline_sentiment                object
airline_sentiment_confidence    float64
negativereason                   object
negativereason_confidence       float64
airline                          object
airline_sentiment_gold           object
name                             object
negativereason_gold              object
retweet_count                     int64
text                             object
tweet_coord                      object
tweet_created                    object
tweet_location                   object
user_timezone                    object
dtype: object

Technically, we will only need to look at the 'text' and 'airline_sentiment' features. They are both listed as objects here. We may need to convert 'text' to string later.

In [641]:
#Let's look at the frequency of tweet sentiments.
tweet['airline_sentiment'].value_counts()
Out[641]:
negative    9178
neutral     3099
positive    2363
Name: airline_sentiment, dtype: int64

Looking at the frequency of the different classes, we can tell that the classes are unbalanced. Negative tweets alone take up 63% of the data. 34% are neutral tweets, and 16% are positive tweets. There is bias towards the bigger class.

2. Understand of data-columns (5 Marks):

a. Drop all other columns except “text” and “airline_sentiment”.

b. Check the shape of data.

c. Print first 5 rows of data.

In [182]:
# a. Drop all other columns except “text” and “airline_sentiment”.
tweet2 = tweet[['text', 'airline_sentiment']]
In [183]:
# b. Check the shape of data.
tweet2.shape
Out[183]:
(14640, 2)
In [184]:
# c. Print first 5 rows of data.
tweet2.head()
Out[184]:
text airline_sentiment
0 @VirginAmerica What @dhepburn said. neutral
1 @VirginAmerica plus you've added commercials t... positive
2 @VirginAmerica I didn't today... Must mean I n... neutral
3 @VirginAmerica it's really aggressive to blast... negative
4 @VirginAmerica and it's a really big bad thing... negative

You could see that even though text is in a dataframe. It is still considered unstructured data. We need to clean it up before we feed it into our ML models.

In [185]:
#Let's use this tweet (#2040) as an example to verify that the pre-processing steps work.

print (tweet2["text"][2040])
@united oh, I'll be sharing alright. Especially about sleeping in this shitty airport and getting 1hr of sleep all night because UA.

3. Text pre-processing: Data preparation.(20Marks)

a. Html tag removal.

b. Tokenization.

c. Remove the numbers.

d. Removal of Special Characters and Punctuations.

e. Conversion to lowercase.

f. Lemmatize or stemming.

g. Join the words in the list to convert back to text string in the dataframe. (So that each row contains the data in text format.)

h. Print first 5 rows of data after pre-processing.

In [186]:
# 3a. Html tag removal
# We should run this in the odd chance that there is a html tag in the text column

def strip_html(text):
    soup = BeautifulSoup(text, "html.parser")
    return soup.get_text()

tweet2['text'] = tweet2['text'].apply(lambda x: strip_html(x))
tweet2.head()
<ipython-input-186-796f3ae1bcb5>:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  tweet2['text'] = tweet2['text'].apply(lambda x: strip_html(x))
Out[186]:
text airline_sentiment
0 @VirginAmerica What @dhepburn said. neutral
1 @VirginAmerica plus you've added commercials t... positive
2 @VirginAmerica I didn't today... Must mean I n... neutral
3 @VirginAmerica it's really aggressive to blast... negative
4 @VirginAmerica and it's a really big bad thing... negative
In [188]:
# b. Tokenization

import nltk                            #Repeated here just in case
                          
nltk.download('punkt')                 #Repeated here just in case



tweet2['text'] = tweet2.apply(lambda row: nltk.word_tokenize(row['text']), axis=1) # Tokenization of data
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\tiffa\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\tiffa\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\tiffa\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
<ipython-input-188-aee408c4240e>:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  tweet2['text'] = tweet2.apply(lambda row: nltk.word_tokenize(row['text']), axis=1) # Tokenization of data
In [189]:
# c. Remove the numbers

tweet2['text'] = tweet2['text'].astype(str)

def remove_numbers(text):
  text = re.sub(r'\d+', '', text)
  return text

tweet2['text'] = tweet2['text'].apply(lambda x: remove_numbers(x))
tweet2.head()
<ipython-input-189-bd753c48c12a>:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  tweet2['text'] = tweet2['text'].astype(str)
<ipython-input-189-bd753c48c12a>:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  tweet2['text'] = tweet2['text'].apply(lambda x: remove_numbers(x))
Out[189]:
text airline_sentiment
0 ['@', 'VirginAmerica', 'What', '@', 'dhepburn'... neutral
1 ['@', 'VirginAmerica', 'plus', 'you', "'ve", '... positive
2 ['@', 'VirginAmerica', 'I', 'did', "n't", 'tod... neutral
3 ['@', 'VirginAmerica', 'it', "'s", 'really', '... negative
4 ['@', 'VirginAmerica', 'and', 'it', "'s", 'a',... negative

You can see from the data that Tokenization was successful because each word and symbol is surounded by their own set of quation marks and separated by commas.

In [190]:
# d. Removal of special characters and punctuation.
# Remove contractions
import contractions
def replace_contractions(text):
    """Replace contractions in string of text"""
    return contractions.fix(text)

tweet2['text'] = tweet2['text'].apply(lambda x: replace_contractions(x))
tweet2.head()
<ipython-input-190-1f5b34a58621>:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  tweet2['text'] = tweet2['text'].apply(lambda x: replace_contractions(x))
Out[190]:
text airline_sentiment
0 ['@', 'VirginAmerica', 'What', '@', 'dhepburn'... neutral
1 ['@', 'VirginAmerica', 'plus', 'you', "'ve", '... positive
2 ['@', 'VirginAmerica', 'I', 'did', "n't", 'tod... neutral
3 ['@', 'VirginAmerica', 'it', "'s", 'really', '... negative
4 ['@', 'VirginAmerica', 'and', 'it', "'s", 'a',... negative
In [191]:
print (tweet2["text"][2040])
['@', 'united', 'oh', ',', 'I', " will", 'be', 'sharing', 'alright', '.', 'Especially', 'about', 'sleeping', 'in', 'this', 'shitty', 'airport', 'and', 'getting', 'hr', 'of', 'sleep', ', 'night', 'because', 'UA', '.']

You can see that the contractions are now gone. "I'll" changed to "I will" in this tweet.

In [192]:
# Remove special characters and punctuation
import re

def remove_special_characters(text, remove_digits=False):
    pattern = r'[^a-zA-Z0-9\s]' if not remove_digits else r'[^a-zA-Z\s]'
    text = re.sub(pattern, '', text)
    return text
    
tweet2['text'] = tweet2['text'].apply(lambda x: remove_special_characters(x))
tweet2.head()
<ipython-input-192-e9bca9b96127>:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  tweet2['text'] = tweet2['text'].apply(lambda x: remove_special_characters(x))
Out[192]:
text airline_sentiment
0 VirginAmerica What dhepburn said neutral
1 VirginAmerica plus you ve added commercials t... positive
2 VirginAmerica I did nt today Must mean I nee... neutral
3 VirginAmerica it s really aggressive to blast... negative
4 VirginAmerica and it s a really big bad thing... negative
In [193]:
print (tweet2["text"][2040])
 united oh  I  will be sharing alright  Especially about sleeping in this shitty airport and getting hr of sleep  night because UA 

The ampersand (@) is now gone. We removed the special characters and the punctuation. There are no more commas or quotation marks that were left by Tokenization.

In [194]:
# e. Conversion to lowercase
tweet2 = pd.DataFrame(tweet2)

tweet2['text'] = tweet2['text'].str.lower()
print (tweet2["text"][2040])
 united oh  i  will be sharing alright  especially about sleeping in this shitty airport and getting hr of sleep  night because ua 

This is important because we don't want capitalization to hinder our ability to identify the same words.

In [195]:
# f. Stemming or Lemmatization
#Since we are going to explore stemming and lemmatization, let's make copies of the dataframe so that we don't get confused

tweet_stem = tweet2
tweet_lemma = tweet2
In [196]:
#Checking to see if the dataframe was copied correctly
print (tweet_stem["text"][2040])
 united oh  i  will be sharing alright  especially about sleeping in this shitty airport and getting hr of sleep  night because ua 
In [197]:
#Let's do stemming first
def simple_stemmer(text):
    ps = nltk.porter.PorterStemmer()
    text = ' '.join([ps.stem(word) for word in text.split()])
    return text

tweet_stem['text'] = tweet_stem['text'].apply(lambda x: simple_stemmer(x))
tweet_stem.head()
Out[197]:
text airline_sentiment
0 virginamerica what dhepburn said neutral
1 virginamerica plu you ve ad commerci to the ex... positive
2 virginamerica i did nt today must mean i need ... neutral
3 virginamerica it s realli aggress to blast obn... negative
4 virginamerica and it s a realli big bad thing ... negative
In [198]:
#Code for lemmatization.
#This is just to be used as a comparison to CountVectorizer when we run models to see if there are any differences in results.

import spacy
nlp = spacy.load('en_core_web_sm')


def lemmatize_text(text):
    text = nlp(text)
    text = ' '.join([word.lemma_ if word.lemma_ != '-PRON-' else word.text for word in text])
    return text

#data_features = data_features.apply(lambda x: lemmatize_text(x))
#data_features.head()

tweet_lemma["text"] = tweet_lemma["text"].apply(lemmatize_text)
In [657]:
tweet_lemma.head()
Out[657]:
text airline_sentiment
0 virginamerica what dhepburn say neutral
1 virginamerica plu you ve ad commerci to the ex... positive
2 virginamerica I do nt today must mean I need t... neutral
3 virginamerica it s realli aggress to blast obn... negative
4 virginamerica and it s a realli big bad thing ... negative

You could tell that it lemmatization worked because in the first document, "say" replaced "said"

4. Vectorization (10 Marks):

a. Use CountVectorizer.

b. Use TfidfVectorizer.

In [199]:
# 4a. CountVectorizer
# This is a way to convert unstructured data into structured data - Changing words into numbers for ML model evaluation. 

from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer(max_features=1000)                # Keep only the first 1000 most frequent features.

#Countvectorizer for stemming data
tweet_stem_vectoriz = vectorizer.fit_transform(tweet_stem['text'])
#Countvectorizer for lemmatization data
tweet_lemma_vectoriz = vectorizer.fit_transform(tweet_lemma['text'])
 
In [201]:
# 4b. TF-IDF Vectorizer
# This is a way to convert unstructured data into structured data - Changing words into numbers for ML model evaluation. 

#TF-IDF
from sklearn.feature_extraction.text import TfidfVectorizer


tfidfvectorizer = TfidfVectorizer(max_features=1000)  

#TF-IDF vectorizer for stemming data
tweet_stem_tfidf = tfidfvectorizer.fit_transform(tweet_stem['text'])
#TF-IDF vectorizer for lemmatization data
tweet_lemma_tfidf = tfidfvectorizer.fit_transform(tweet_lemma['text'])
In [202]:
#Transform all stemming/lemmatized and vectorized data into arrays


data_features_stem_vec = tweet_stem_vectoriz.toarray()        #Stemming-CountVectorized
data_features_lemma_vec = tweet_lemma_vectoriz.toarray()      #Lemmatization-CountVectorized 
data_features_stem_tfidf = tweet_stem_tfidf.toarray()         #Stemming-TF-IDF
data_features_lemma_tfidf = tweet_lemma_tfidf.toarray()        #Lemmatization-TF-IDF
In [493]:
#Checking to see if TF-IDF was successful
  
data_features_stem_tfidf[10]
Out[493]:
array([0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.161396  , 0.        , 0.39207673,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.24953449, 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.37486809, 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.24890264, 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.53710519,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.26537536, 0.17545897, 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.35998184, 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.19026889, 0.        , 0.        , 0.        ])
In [659]:
#Checking to see if CountVectorizer was successful

 
data_features_stem_vec[10]
Out[659]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
       0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 0, 0, 0], dtype=int64)
In [204]:
#Create simpler name for the labels (outcome)
labels_stem = tweet_stem['airline_sentiment']
labels_lemma = tweet_lemma['airline_sentiment']

5. Fit and evaluate model using both type of vectorization.(6+6 Marks)

In [205]:
#Split data into training and testing set for stemming/lemmatized/CountVectorized/TF-IDF data

from sklearn.model_selection import train_test_split


#Stemming-CountVectorizer Data
X_train_stem_vec, X_test_stem_vec, y_train_stem_vec, y_test_stem_vec = train_test_split(data_features_stem_vec, labels_stem, test_size=0.3, random_state=42)
#Lemmatization-CountVectorizer Data
X_train_lemma_vec, X_test_lemma_vec, y_train_lemma_vec, y_test_lemma_vec = train_test_split(data_features_lemma_vec, labels_lemma, test_size=0.3, random_state=42)
#Stemming- TF-IDF Data
X_train_stem_tfidf, X_test_stem_tfidf, y_train_stem_tfidf, y_test_stem_tfidf = train_test_split(data_features_stem_tfidf, labels_stem, test_size=0.3, random_state=42)
#Lemmatization-TF-IDF Data
X_train_lemma_tfidf, X_test_lemma_tfidf, y_train_lemma_tfidf, y_test_lemma_tfidf = train_test_split(data_features_lemma_tfidf, labels_lemma, test_size=0.3, random_state=42)

Model of Choice:

Logistic Regression Model #1: Using CountVectorizer

In [266]:
#Logistic Regression using data that went through stemming and CountVectorization

from sklearn.linear_model import LogisticRegression                     # Importing logistic regression

lr = LogisticRegression()

lr_stem_vec= lr.fit(X_train_stem_vec, y_train_stem_vec)
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
In [559]:
# Accuracy score after K-Fold Cross-validation on training set.
# Model's accuracy is based on the average of each fold.

lr_stem_vec_train = np.mean(cross_val_score(lr_stem_vec,X_train_stem_vec,y_train_stem_vec,cv=10))
lr_stem_vec_train
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
Out[559]:
0.7862981135670732

After K-Fold cross validation, accuracy of the model is 79%. This is a big improvement form all the models I tuned at the end of this project.

With K-Fold cross validation, model overfitting is less likely.

In [560]:
#Test this first logistic regression model on the test data
lr_stem_vec_test = lr_stem_vec.score(X_test_stem_vec, y_test_stem_vec)
lr_stem_vec_test
Out[560]:
0.8010018214936248

Strangely enough, the logistic regression model was strong when we tested it on the test data. It showed an 80% accuracy on the test data, one of the highest of all the models I created.

In [661]:
#Get the model to make predictions.

lr_result_stem_vec = lr_stem_vec.predict(X_test_stem_vec)
lr_result_stem_vec
Out[661]:
array(['positive', 'negative', 'negative', ..., 'negative', 'negative',
       'negative'], dtype=object)
In [662]:
#Let's create the confusion matrix fot this first logistic regression model.

import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix

conf_mat_lg1 = confusion_matrix(y_test_stem_vec, lr_result_stem_vec)

print(conf_mat_lg1)

df_cm = pd.DataFrame(conf_mat_lg1, index = [i for i in "123"],
                  columns = [i for i in "123"])
plt.figure(figsize = (10,7))
sns.heatmap(df_cm, annot=True, fmt='g')
[[2500  226   88]
 [ 263  539   82]
 [ 115  100  479]]
Out[662]:
<matplotlib.axes._subplots.AxesSubplot at 0x17a90648040>

Here, 1 = "Negative", 2 = "Neutral", and 3 = "Positive".

Columns = Predictions

Rows = Actual


From this model, 2500 tweets were correctly predicted as "negative", "539" were predicted as "neutral", and 479 were predicted as "positive", resulting in an 80% accuracy on unseen test data.

Let's discuss the precision and recall for all of the sentiment types:

Negative sentiment: Precision: 2500/(2500+263+115) = 0.87 (87%), Recall: 2500/(2500+226+88) = 0.89 (89%)

Neutral sentiment: Precision: 539/(226+539+100) = 0.62 (62%), Recall: 539/(263+539+82) = 0.61 (61%)

Positive sentiment: Precision: 479/(88+82+479) = 0.74 (74%), Recall: 479/(115+100+479) = 0.69 (69%)

In general, the logistic regression model performed very well in that it got 80% of the test data accurate.

When it comes to its predictions for each sentiment type, it performed extraordinarily well with negative sentiments. Of all the negative predictions, 87% were actually accurate. Recall for negative sentiment was even higher; of all the actual negative sentiments, 89% were predicted accurately. In detecting positive sentiments, logistic regression performed decently: 74% of all positive predictions were truly correct while 69% of true positive sentiments were predicted accurately.

The model faltered mostly in predicting neutral sentiments. Precision for neutral sentiments was only 62% and recall was even lower at 61%. This goes to show that the model is strong in predicting extreme sentiments but does not perform as well on messages that lie in the neutral zone.

In [ ]:
 

2nd logistic regression Model: Using TF-IDF

In [497]:
#Logistic Regression using TF-IDF data

lr = LogisticRegression()

lr_stem_tfidf= lr.fit(X_train_stem_tfidf, y_train_stem_tfidf)       #We will continue using the stemming data just for comparison to the previous model.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
In [498]:
#K-Fold cross validation accuracy score.
#Accuracy is the average of each fold's accuracy

lr_stem_tfidf_train = np.mean(cross_val_score(lr_stem_tfidf,X_train_stem_tfidf,y_train_stem_tfidf,cv=10))
lr_stem_tfidf_train
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
Out[498]:
0.7933234565548781

The accuracy score for this second logistic regression model using TF-IDF data is only slightly higher than the first model. TF-IDF Vectorization method gives unique weights to word frequency, and is supposed to yield better results than CountVectorizer which holds the same weight for each word. So, this higher accuracy is not unexpected.

In [499]:
#Test this 2nd logistic regression model (TF-IDF) on the test data
lr_stem_tfidf_test = lr_stem_tfidf.score(X_test_stem_tfidf, y_test_stem_tfidf)
lr_stem_tfidf_test
Out[499]:
0.8085154826958105

As expected, there is a high accuracy (81%) with no real overfitting issues.

Both CountVectorizer and TF-IDF yielded very similar results on training and testing data for logistic regression. The TF-IDF dataset had a test accuracy of 81% while the CountVectorizer data had a test accuracy of 80%.

In [500]:
#Let's look at the predictions with the TF-IDF Vectorized data:
lr_result_stem_tfidf = lr_stem_vec.predict(X_test_stem_tfidf)
lr_result_stem_tfidf
Out[500]:
array(['positive', 'neutral', 'negative', ..., 'neutral', 'negative',
       'neutral'], dtype=object)
In [502]:
#Let's look at the confusion matrix for the Stemming-TF-IDF Vectorized data.

import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix

conf_mat2 = confusion_matrix(y_test_stem_tfidf, lr_result_stem_tfidf)

print(conf_mat2)

df_cm2 = pd.DataFrame(conf_mat2, index = [i for i in "123"],
                  columns = [i for i in "123"])
plt.figure(figsize = (10,7))
sns.heatmap(df_cm2, annot=True, fmt='g')
[[1320 1470   24]
 [  35  833   16]
 [  44  386  264]]
Out[502]:
<matplotlib.axes._subplots.AxesSubplot at 0x17a88e78970>

Here, 1 = "Negative", 2 = "Neutral", and 3 = "Positive".

Columns = Predictions

Rows = Actual


From this model, 1320 tweets were correctly predicted as "negative", 833 were predicted as "neutral", and 264 were predicted as "positive", resulting in an 81% accuracy on unseen test data.

Let's discuss the precision and recall for all of the sentiment types:

Negative sentiment: Precision: 1320/(1320+35+44) = 0.94 (94%), Recall: 1320/(1320+1470+24) = 0.47 (47%)

Neutral sentiment: Precision: 833/(1470+833+386) = 0.31 (31%), Recall: 833/(35+833+16) = 0.94 (94%)

Positive sentiment: Precision: 264/(24+16+264) = 0.87 (87%), Recall: 264/(44+386+264) = 0.38 (38%)

For this model, we used the Stemming-TF-IDF Vectorizer data. Although it has a 1% higher accuracy than the CountVectorizer model on the training dataset, I don't think this model performed as well. Judging by the precision and recall values, they are at extremes for all sentiment types. For positive and negative sentiments, the precision values were very strong (respectively: 0.87 (87%), and 0.94 (0.94%)) but the recall value bombed (respectively: 0.38(38%) and 0.47(47%)). This suggests that of all the positive predictions, the model got a high amount of them right. However, of all of the true positive Tweets, a very small amount were predicted correctly. We see the same trend for negative sentiments as well.

The opposite holds true for neutral sentiments, high recall (0.94(94%)) but very low precision (0.31(31%)). Among all the true neutral Tweets, this model predicted a high amount correctly. Of all the neutral predictions, only 31% were actually neutral.

Final comparison between both models:

The CountVectorizer model seemed to peform better than the TF-IDF model for logistic regression. The CountVectorizer model had recall and precision values that were all above 60%, which made it a lot more balanced and reliable. On the other hand, the TF-IDF delivered some very high metric scores but also very low metric scores as well.

If I had to deploy one of these models, I would choose the logistic regression-CountVectorizer model.

6. Summarize your understanding of the application of Various Pre-processing and Vectorization and performance of your model on this dataset. (8 Marks)

Pre-Processing:

All the pre-processing steps are crucial. It's basically cleaning the data. These are steps data scientists use to reduce the amount of noise in the data and to disgard features or items that bear irrelevant meaning. These unnecessary features include: special characters like ampersand (@), punctuation like the comma (,), html text. It also groups important words together like 'GOOD', and 'good' by adopting lowercase letters, or grouping 'well' and 'good' using lemmatization. These are fundamental organizational steps to reduce the not so important dimensions.

The most notable pre-processing step that I can attest its value to the models are: stemming and lemmatization. They performed almost exactly the same for this data even though their functions are different. Stemming cuts off prefixes and suffixes to organize words based on similarity. Lemmatization has a backend dictionary that transforms words to their root words.

Vectorization:

What does make a difference in the performance is the type of vectorization used. CountVectorizer seemed to perform better in terms of accuracy with most models tried here, except for logistic regression and K Nearest Neighbors. For these exceptions, TF-IDF was a better choice if you're only considering accuracy scores. However, after evaluating precision and recall, CountVectorizer models seem more dependable than TF-IDF at making predictions.

It is important to note that the CountVectorizer transforms text data into 0s and 1s, which can be interpreted as discrete values. On the other hand, TF-IDF scores words based on frequency and assigns float-like values to each word. Lesser weight is given to more frequent words and more weight is given to infrequent words. For the Naive Bayes models (Towards the end of the document), there is MultinomialNB() and there is GaussianNB(). MultinomialNB() works better with discrete values and Gaussian NB() works better with continuous numerical data. I used GaussianNB() on the TF-IDF X_train data, and the CountVectorizer data with MultinomialNB(). The MultinomialNB()-CountVectorizer model peformed a lot better with 78% accuracy compared to 37% accuracy for the GaussianNB()-TF-IDF model.

Model Performance:

I originally expected Naive Bayes or Support Vector Machine to be the top contenders for this project because when it comes to natural language processing, these models tend to perform very well (Neural networks also). For this project, SVM came close. Despite having the highest accuracy of all models, it bombed a lot more than logistic regression using CountVectorizer on other important classification metrics. It wasn't as reliable of a model (See Support Vector Machine Models 4 and 5 in Hyperparameter Tuning Section near the end of this document).

Logistic regression using CountVectorizer data seems to be my choice model of all the choices here. Support Vector Machine may be 2nd and Adaboost is 3rd. Despite it not having the absolute highest accuracy among all, logistic regression with CountVectorizer had decent recall scores, especially when it came to positive and neutral Tweets. Recall measure, in my opinion, is a more important metric to consider than accuracy in this instance because the classes are not balanced, with the majority of tweets in this dataset being negative in nature, 63% to be exact.

It is so important to consider 1-Recall especially because this indicates how many negative, neutral or positive Tweets your model missed. For my model of choice, recall for neutral tweets was: 61%, recall for positive tweets was: 69% and recall for negative tweets was: 89%. This means that the model will miss 39% of the neutral tweets, and 31% of the positive tweets, and 11% of the negative tweets. This is a lot better than the results from the second logistic regression model using TF-IDF, where it is expected to miss 53% of the negative tweets (recall: 47%), 6% of the neutral tweets (recall: 94%), and 62% of the positive tweets (recall: 38%). Improvement in recall and 1-recall makes a far better model even if I have to sacrifice 1% in accuracy.

In [671]:
# Accuracy Table of all Machine Learning Models:
# Highest Train accuracy from K-Fold Cross Validation, and Highest Test Accuracy Recorded For Each Model Type

Accuracy_Table ={'ML Model':['Logistic Regression', 'Logistic Regression' , 'Support Vector Machine', 'Support Vector Machine', 'Random Forest', 'Decision Tree', 'Bagging', 'AdaBoost', 'Gradient Boost', 'K Nearest Neighbors', 'Naive Bayes'], 'Vectorization Method': ['CountVectorizer', 'TF-IDF', 'CountVectorizer', 'TF-IDF', 'CountVectorizer', 'CountVectorizer', 'CountVectorizer', 'CountVectorizer', 'CountVectorizer', 'TF-IDF', 'CountVectorizer'], 'Train_Accuracy':[lr_stem_vec_train, lr_stem_tfidf_train, svm_rbf_stem_vec_train, svm_rbf_stem_tfidf_train, forest_lemma_vec_train, dt_stem_vec_train, bag_stem_vec_train, abcl_stem_vec_train, gbcl_stem_vec_train, knc_stem_tfidf_train, mnb_stem_vec_train], 'Test_Accuracy':[lr_stem_vec_test, lr_stem_tfidf_test, svm_rbf_stem_vec_test, svm_rbf_stem_tfidf_test, forest_lemma_vec_test, dt_stem_vec_test, bag_stem_vec_test, abcl_stem_vec_test, gbcl_stem_vec_test, knc_stem_tfidf_test, mnb_stem_vec_test], 'Notes': ['Choice Model', 'High accuracy, low recall for positive and neutral Tweets', 'High accuracy, low recall for positive and neutral Tweets', 'High accuracy, low recall for positive and neutral tweets', '', '', '', '', '', '', ', ']}
Accuracy_Table_= pd.DataFrame(Accuracy_Table)
Accuracy_Table_
Out[671]:
ML Model Vectorization Method Train_Accuracy Test_Accuracy Notes
0 Logistic Regression CountVectorizer 0.786298 0.801002 Choice Model
1 Logistic Regression TF-IDF 0.793323 0.808515 High accuracy, low recall for positive and neu...
2 Support Vector Machine CountVectorizer 0.789326 0.811248 High accuracy, low recall for positive and neu...
3 Support Vector Machine TF-IDF 0.794984 0.809882 High accuracy, low recall for positive and neu...
4 Random Forest CountVectorizer 0.764050 0.782332
5 Decision Tree CountVectorizer 0.698770 0.715847
6 Bagging CountVectorizer 0.758683 0.773224
7 AdaBoost CountVectorizer 0.765221 0.786202
8 Gradient Boost CountVectorizer 0.756927 0.766621
9 K Nearest Neighbors TF-IDF 0.720434 0.724954
10 Naive Bayes CountVectorizer 0.772930 0.780282 ,
In [ ]:

Ways to improve:

1. Downsampling the number of negative tweets.

Downsampling the number of negative tweets would lower the bias towards the larger class. This could improve model performance and better the precision and recall metrics.

2. Try a neural network model.

Neural networks have strong success when it comes to natural language processing. I would have tried it but there are different steps in the pre-processing. In neural networks, you need to prepare embeddings using applications like Word2vec, GloVe and fastText. The data has to be converted from sparse (like what we have here) to dense. In sparse data, there are excessive zeroes, like what we see in TF-IDF transformed data.

3. Implement a list of stopwords.

This could further help to diminish the amount of noise in textual data, and remove features that don't carry much meaning.

4. Create a mixed model.

Stacking and Blending are techniques that allow you to take advantage of the strengths of multiple model types like random forest, logistic regression, etc. For blending models, you can assign weights to each method and capitalize on better accuracy and better metric results.

5. Applying PCA for dimensionality reduction.

Since our data is originally unstructed and consists of words, there is a lot to process. After it is tokenized and has become a bag of words, the number of features is large. One way to reduce the amount of unnecessary features would be to undergo PCA, which will lessen dimensionality, save time, and facilitate the model's processing of the data.

In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 

Extra:

Trying Different Model Types and Hyperparameter Tuning with Comments:

In order to prevent data leaks, I only applied the model that peformed the best through k-fold cross-validation to testing data after I was satisfied with hyperparameter tuning.

Random Forest 1st Model:

In [541]:
#Random Forest for Stemming-CountVectorizer Data

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
import numpy as np
# Initialize Random Forest classifier, 100 trees
forest = RandomForestClassifier(verbose=2,n_jobs=4,n_estimators = 100) 

print ("Training the random forest...")
forest_stem_vec = forest.fit(X_train_stem_vec, y_train_stem_vec)
# random forest performance through k-fold cross validation 
print (forest_stem_vec)
Training the random forest...
building tree 1 of 100
building tree 2 of 100
building tree 3 of 100
building tree 4 of 100
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
building tree 5 of 100
building tree 6 of 100
building tree 7 of 100
building tree 8 of 100
building tree 9 of 100
building tree 10 of 100
building tree 11 of 100
building tree 12 of 100
building tree 13 of 100
building tree 14 of 100
building tree 15 of 100
building tree 16 of 100
building tree 17 of 100
building tree 18 of 100
building tree 19 of 100
building tree 20 of 100
building tree 21 of 100
building tree 22 of 100
building tree 23 of 100
building tree 24 of 100
building tree 25 of 100
building tree 26 of 100
building tree 27 of 100
building tree 28 of 100
building tree 29 of 100
building tree 30 of 100
building tree 31 of 100
building tree 32 of 100
building tree 33 of 100
building tree 34 of 100
building tree 35 of 100
building tree 36 of 100
building tree 37 of 100
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    1.0s
building tree 38 of 100
building tree 39 of 100
building tree 40 of 100
building tree 41 of 100
building tree 42 of 100
building tree 43 of 100
building tree 44 of 100
building tree 45 of 100
building tree 46 of 100
building tree 47 of 100
building tree 48 of 100
building tree 49 of 100
building tree 50 of 100
building tree 51 of 100
building tree 52 of 100
building tree 53 of 100
building tree 54 of 100
building tree 55 of 100
building tree 56 of 100
building tree 57 of 100
building tree 58 of 100
building tree 59 of 100building tree 60 of 100

building tree 61 of 100
building tree 62 of 100
building tree 63 of 100
building tree 64 of 100
building tree 65 of 100
building tree 66 of 100
building tree 67 of 100
building tree 68 of 100
building tree 69 of 100
building tree 70 of 100
building tree 71 of 100
building tree 72 of 100
building tree 73 of 100
building tree 74 of 100
building tree 75 of 100
building tree 76 of 100
building tree 77 of 100
building tree 78 of 100
building tree 79 of 100
building tree 80 of 100building tree 81 of 100

building tree 82 of 100
building tree 83 of 100
building tree 84 of 100
building tree 85 of 100
building tree 86 of 100
building tree 87 of 100
building tree 88 of 100
building tree 89 of 100
building tree 90 of 100
building tree 91 of 100
building tree 92 of 100
building tree 93 of 100
building tree 94 of 100
building tree 95 of 100
building tree 96 of 100
building tree 97 of 100
building tree 98 of 100
building tree 99 of 100
building tree 100 of 100
RandomForestClassifier(n_jobs=4, verbose=2)
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    3.3s finished
In [542]:
forest_stem_vec_train = np.mean(cross_val_score(forest_stem_vec,X_train_stem_vec,y_train_stem_vec,cv=10))
forest_stem_vec_train
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:   15.8s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:   17.7s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.6s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    2.7s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.6s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    2.6s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.8s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    2.9s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.6s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    2.7s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.6s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    2.7s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.6s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    2.6s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.6s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    2.7s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.9s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    3.0s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.7s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    2.7s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
Out[542]:
0.7615131478658537

Change some hyperparameters and data input to see if we can get an improvement on this mean k-fold cross validation result.

2nd Random Forest Model:

In [668]:
#Use the same Random Forest model and apply it to the lemmatization-CountVectorizer data

forest_lemma_vec = forest.fit(X_train_lemma_vec, y_train_lemma_vec)
# random forest performance through cross vaidation 
print (forest_lemma_vec)
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
building tree 1 of 100building tree 2 of 100
building tree 3 of 100
building tree 4 of 100

building tree 5 of 100building tree 6 of 100

building tree 7 of 100
building tree 8 of 100
building tree 9 of 100
building tree 10 of 100
building tree 11 of 100
building tree 12 of 100
building tree 13 of 100
building tree 14 of 100building tree 15 of 100

building tree 16 of 100
building tree 17 of 100
building tree 18 of 100
building tree 19 of 100
building tree 20 of 100
building tree 21 of 100
building tree 22 of 100
building tree 23 of 100
building tree 24 of 100
building tree 25 of 100
building tree 26 of 100
building tree 27 of 100
building tree 28 of 100
building tree 29 of 100
building tree 30 of 100
building tree 31 of 100
building tree 32 of 100
building tree 33 of 100
building tree 34 of 100
building tree 35 of 100
building tree 36 of 100
building tree 37 of 100
building tree 38 of 100
building tree 39 of 100
building tree 40 of 100
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    1.0s
building tree 41 of 100
building tree 42 of 100
building tree 43 of 100
building tree 44 of 100
building tree 45 of 100
building tree 46 of 100
building tree 47 of 100
building tree 48 of 100
building tree 49 of 100
building tree 50 of 100
building tree 51 of 100
building tree 52 of 100
building tree 53 of 100
building tree 54 of 100
building tree 55 of 100
building tree 56 of 100
building tree 57 of 100
building tree 58 of 100
building tree 59 of 100
building tree 60 of 100
building tree 61 of 100
building tree 62 of 100
building tree 63 of 100
building tree 64 of 100
building tree 65 of 100
building tree 66 of 100
building tree 67 of 100
building tree 68 of 100
building tree 69 of 100
building tree 70 of 100
building tree 71 of 100
building tree 72 of 100
building tree 73 of 100
building tree 74 of 100
building tree 75 of 100
building tree 76 of 100
building tree 77 of 100
building tree 78 of 100
building tree 79 of 100
building tree 80 of 100
building tree 81 of 100
building tree 82 of 100
building tree 83 of 100
building tree 84 of 100
building tree 85 of 100
building tree 86 of 100
building tree 87 of 100
building tree 88 of 100
building tree 89 of 100
building tree 90 of 100
building tree 91 of 100
building tree 92 of 100
building tree 93 of 100
building tree 94 of 100
building tree 95 of 100
building tree 96 of 100
building tree 97 of 100
building tree 98 of 100
building tree 99 of 100
building tree 100 of 100
RandomForestClassifier(n_jobs=4, verbose=2)
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    3.0s finished
In [669]:
forest_lemma_vec_train = np.mean(cross_val_score(forest_lemma_vec,X_train_lemma_vec,y_train_lemma_vec,cv=10))
forest_lemma_vec_train
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    2.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    3.8s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.8s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    2.8s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    2.1s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    3.3s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.7s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    2.8s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.7s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    2.8s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.7s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    2.7s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.6s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    2.9s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    1.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    2.9s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.7s
[Parallel(n_jobs=4)]: Done  93 out of 100 | elapsed:    2.8s remaining:    0.1s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    2.9s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.7s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    2.7s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
Out[669]:
0.7640504001524391
In [670]:
#Test this model (stemming-CountVectorizer) on the test data becuase it has the highest accuracy from cross-validation:
forest_lemma_vec_test = forest_lemma_vec.score(X_test_lemma_vec, y_test_lemma_vec)
forest_lemma_vec_test
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
Out[670]:
0.7823315118397086

3rd Random Forest Model:

In [547]:
#Use the same Random Forest model and apply it to the lemmatization-CountVectorizer data

forest_stem_tfidf = forest.fit(X_train_stem_tfidf, y_train_stem_tfidf)
# random forest performance through cross vaidation 
print (forest_stem_tfidf)
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
building tree 1 of 100
building tree 2 of 100
building tree 3 of 100
building tree 4 of 100
building tree 5 of 100
building tree 6 of 100
building tree 7 of 100
building tree 8 of 100
building tree 9 of 100
building tree 10 of 100
building tree 11 of 100
building tree 12 of 100
building tree 13 of 100
building tree 14 of 100
building tree 15 of 100
building tree 16 of 100
building tree 17 of 100
building tree 18 of 100
building tree 19 of 100
building tree 20 of 100
building tree 21 of 100
building tree 22 of 100
building tree 23 of 100
building tree 24 of 100
building tree 25 of 100
building tree 26 of 100
building tree 27 of 100
building tree 28 of 100
building tree 29 of 100
building tree 30 of 100
building tree 31 of 100
building tree 32 of 100
building tree 33 of 100
building tree 34 of 100
building tree 35 of 100
building tree 36 of 100
building tree 37 of 100
building tree 38 of 100
building tree 39 of 100
building tree 40 of 100
building tree 41 of 100
building tree 42 of 100
building tree 43 of 100
building tree 44 of 100
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    1.2s
building tree 45 of 100
building tree 46 of 100
building tree 47 of 100
building tree 48 of 100
building tree 49 of 100
building tree 50 of 100
building tree 51 of 100
building tree 52 of 100
building tree 53 of 100
building tree 54 of 100
building tree 55 of 100
building tree 56 of 100
building tree 57 of 100
building tree 58 of 100
building tree 59 of 100
building tree 60 of 100
building tree 61 of 100
building tree 62 of 100
building tree 63 of 100
building tree 64 of 100
building tree 65 of 100
building tree 66 of 100
building tree 67 of 100
building tree 68 of 100
building tree 69 of 100
building tree 70 of 100
building tree 71 of 100
building tree 72 of 100
building tree 73 of 100
building tree 74 of 100
building tree 75 of 100
building tree 76 of 100
building tree 77 of 100
building tree 78 of 100
building tree 79 of 100
building tree 80 of 100
building tree 81 of 100
building tree 82 of 100
building tree 83 of 100
building tree 84 of 100
building tree 85 of 100
building tree 86 of 100
building tree 87 of 100
building tree 88 of 100
building tree 89 of 100
building tree 90 of 100
building tree 91 of 100
building tree 92 of 100
building tree 93 of 100
building tree 94 of 100
building tree 95 of 100
building tree 96 of 100
building tree 97 of 100
building tree 98 of 100
building tree 99 of 100
building tree 100 of 100
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    3.7s finished
RandomForestClassifier(n_jobs=4, verbose=2)
In [548]:
forest_stem_tfidf_train = np.mean(cross_val_score(forest_stem_tfidf,X_train_stem_tfidf,y_train_stem_tfidf,cv=10))
forest_stem_tfidf_train
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    2.2s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    4.6s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    2.0s
[Parallel(n_jobs=4)]: Done  93 out of 100 | elapsed:    3.2s remaining:    0.1s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    3.4s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    2.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    3.3s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    2.0s
[Parallel(n_jobs=4)]: Done  93 out of 100 | elapsed:    3.2s remaining:    0.1s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    3.3s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    2.2s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    3.5s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    2.2s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    3.4s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    2.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    3.2s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.9s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    3.2s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    2.1s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    3.4s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    2.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    3.2s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
Out[548]:
0.7611224275914634

4th Random Forest Model:

In [550]:
forest_lemma_tfidf = forest.fit(X_train_lemma_tfidf, y_train_lemma_tfidf)
# random forest performance through cross vaidation 
print (forest_lemma_tfidf)
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
building tree 1 of 100
building tree 2 of 100
building tree 3 of 100
building tree 4 of 100
building tree 5 of 100
building tree 6 of 100
building tree 7 of 100
building tree 8 of 100
building tree 9 of 100
building tree 10 of 100
building tree 11 of 100
building tree 12 of 100
building tree 13 of 100
building tree 14 of 100
building tree 15 of 100
building tree 16 of 100
building tree 17 of 100
building tree 18 of 100
building tree 19 of 100
building tree 20 of 100
building tree 21 of 100
building tree 22 of 100
building tree 23 of 100
building tree 24 of 100
building tree 25 of 100
building tree 26 of 100
building tree 27 of 100
building tree 28 of 100
building tree 29 of 100
building tree 30 of 100
building tree 31 of 100
building tree 32 of 100
building tree 33 of 100
building tree 34 of 100
building tree 35 of 100
building tree 36 of 100
building tree 37 of 100
building tree 38 of 100
building tree 39 of 100building tree 40 of 100

building tree 41 of 100
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    1.1s
building tree 42 of 100
building tree 43 of 100
building tree 44 of 100
building tree 45 of 100
building tree 46 of 100
building tree 47 of 100
building tree 48 of 100
building tree 49 of 100
building tree 50 of 100
building tree 51 of 100
building tree 52 of 100
building tree 53 of 100
building tree 54 of 100
building tree 55 of 100
building tree 56 of 100
building tree 57 of 100
building tree 58 of 100
building tree 59 of 100
building tree 60 of 100
building tree 61 of 100
building tree 62 of 100
building tree 63 of 100
building tree 64 of 100
building tree 65 of 100
building tree 66 of 100
building tree 67 of 100
building tree 68 of 100
building tree 69 of 100
building tree 70 of 100
building tree 71 of 100
building tree 72 of 100
building tree 73 of 100
building tree 74 of 100
building tree 75 of 100
building tree 76 of 100
building tree 77 of 100
building tree 78 of 100
building tree 79 of 100
building tree 80 of 100
building tree 81 of 100
building tree 82 of 100
building tree 83 of 100
building tree 84 of 100
building tree 85 of 100
building tree 86 of 100
building tree 87 of 100
building tree 88 of 100
building tree 89 of 100
building tree 90 of 100
building tree 91 of 100
building tree 92 of 100
building tree 93 of 100
building tree 94 of 100
building tree 95 of 100
building tree 96 of 100
building tree 97 of 100
building tree 98 of 100
building tree 99 of 100
building tree 100 of 100
RandomForestClassifier(n_jobs=4, verbose=2)
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    3.6s finished
In [551]:
forest_lemma_tfidf_train = np.mean(cross_val_score(forest_lemma_tfidf,X_train_lemma_tfidf,y_train_lemma_tfidf,cv=10))
forest_lemma_tfidf_train
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    2.1s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    3.5s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    2.3s
[Parallel(n_jobs=4)]: Done  93 out of 100 | elapsed:    3.5s remaining:    0.2s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    3.6s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    2.1s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    3.3s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    1.5s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    3.7s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.9s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    3.1s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    1.1s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    3.3s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.8s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    3.2s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    2.2s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    3.5s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.9s
[Parallel(n_jobs=4)]: Done  93 out of 100 | elapsed:    3.0s remaining:    0.1s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    3.1s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.9s
[Parallel(n_jobs=4)]: Done  93 out of 100 | elapsed:    3.0s remaining:    0.1s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    3.1s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed:    0.0s finished
Out[551]:
0.7607323742378049

RandomSearch CV for Random Forest to see if we can make improvements

In [636]:
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint as sp_randint

rf_random_param_dist = {"max_depth": [3, None],
              "n_estimators": sp_randint(0, 200),
              "max_leaf_nodes": [1, None],
              "min_samples_leaf": sp_randint(1, 3),
              "bootstrap": [True, False],
              "criterion": ["gini", "entropy"]}

# run randomized search
samples = 20  # number of random samples 
randomCV = RandomizedSearchCV(forest, param_distributions=rf_random_param_dist, n_iter=samples) #default cv = 3


randomCV.fit(X_train_stem_tfidf, y_train_stem_tfidf)

 
print(randomCV.best_params_)
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 168, in _parallel_build_trees
    tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 168, in _parallel_build_trees
    tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 168, in _parallel_build_trees
    tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 168, in _parallel_build_trees
    tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 168, in _parallel_build_trees
    tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 168, in _parallel_build_trees
    tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 168, in _parallel_build_trees
    tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 168, in _parallel_build_trees
    tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 168, in _parallel_build_trees
    tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 168, in _parallel_build_trees
    tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    2.0s
[Parallel(n_jobs=4)]: Done 108 out of 108 | elapsed:    4.2s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 108 out of 108 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.6s
[Parallel(n_jobs=4)]: Done 108 out of 108 | elapsed:    2.8s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 108 out of 108 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.8s
[Parallel(n_jobs=4)]: Done 101 out of 108 | elapsed:    2.9s remaining:    0.1s
[Parallel(n_jobs=4)]: Done 108 out of 108 | elapsed:    3.0s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 108 out of 108 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.5s
[Parallel(n_jobs=4)]: Done 108 out of 108 | elapsed:    2.8s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 108 out of 108 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.5s
[Parallel(n_jobs=4)]: Done 101 out of 108 | elapsed:    2.7s remaining:    0.1s
[Parallel(n_jobs=4)]: Done 108 out of 108 | elapsed:    2.8s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 108 out of 108 | elapsed:    0.0s finished
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 348, in fit
    self._validate_estimator()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_base.py", line 134, in _validate_estimator
    raise ValueError("n_estimators must be greater than zero, "
ValueError: n_estimators must be greater than zero, got 0.

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 348, in fit
    self._validate_estimator()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_base.py", line 134, in _validate_estimator
    raise ValueError("n_estimators must be greater than zero, "
ValueError: n_estimators must be greater than zero, got 0.

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 348, in fit
    self._validate_estimator()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_base.py", line 134, in _validate_estimator
    raise ValueError("n_estimators must be greater than zero, "
ValueError: n_estimators must be greater than zero, got 0.

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 348, in fit
    self._validate_estimator()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_base.py", line 134, in _validate_estimator
    raise ValueError("n_estimators must be greater than zero, "
ValueError: n_estimators must be greater than zero, got 0.

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 348, in fit
    self._validate_estimator()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_base.py", line 134, in _validate_estimator
    raise ValueError("n_estimators must be greater than zero, "
ValueError: n_estimators must be greater than zero, got 0.

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  16 out of  16 | elapsed:    0.1s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  16 out of  16 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  16 out of  16 | elapsed:    0.1s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  16 out of  16 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  16 out of  16 | elapsed:    0.1s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  16 out of  16 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  16 out of  16 | elapsed:    0.1s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  16 out of  16 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  16 out of  16 | elapsed:    0.1s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  16 out of  16 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   7 out of   9 | elapsed:    0.9s remaining:    0.2s
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   7 out of   9 | elapsed:    1.0s remaining:    0.2s
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 168, in _parallel_build_trees
    tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 168, in _parallel_build_trees
    tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 168, in _parallel_build_trees
    tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 168, in _parallel_build_trees
    tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 168, in _parallel_build_trees
    tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  35 out of  35 | elapsed:    2.1s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  35 out of  35 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  28 out of  35 | elapsed:    0.8s remaining:    0.1s
[Parallel(n_jobs=4)]: Done  35 out of  35 | elapsed:    0.9s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  35 out of  35 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  35 out of  35 | elapsed:    0.9s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  35 out of  35 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  28 out of  35 | elapsed:    1.1s remaining:    0.2s
[Parallel(n_jobs=4)]: Done  35 out of  35 | elapsed:    1.2s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  35 out of  35 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  28 out of  35 | elapsed:    0.8s remaining:    0.1s
[Parallel(n_jobs=4)]: Done  35 out of  35 | elapsed:    0.9s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  35 out of  35 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.5s
[Parallel(n_jobs=4)]: Done  62 out of  69 | elapsed:    1.5s remaining:    0.1s
[Parallel(n_jobs=4)]: Done  69 out of  69 | elapsed:    1.7s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done  69 out of  69 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.4s
[Parallel(n_jobs=4)]: Done  62 out of  69 | elapsed:    1.5s remaining:    0.1s
[Parallel(n_jobs=4)]: Done  69 out of  69 | elapsed:    1.6s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done  69 out of  69 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.4s
[Parallel(n_jobs=4)]: Done  62 out of  69 | elapsed:    1.5s remaining:    0.1s
[Parallel(n_jobs=4)]: Done  69 out of  69 | elapsed:    1.6s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done  69 out of  69 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.4s
[Parallel(n_jobs=4)]: Done  69 out of  69 | elapsed:    1.6s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done  69 out of  69 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.4s
[Parallel(n_jobs=4)]: Done  69 out of  69 | elapsed:    1.6s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done  69 out of  69 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    1.7s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    7.2s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    1.9s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    7.2s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    1.6s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    7.5s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    1.6s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    6.7s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    2.8s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    6.8s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 168, in _parallel_build_trees
    tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 168, in _parallel_build_trees
    tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 168, in _parallel_build_trees
    tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 168, in _parallel_build_trees
    tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 168, in _parallel_build_trees
    tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 168, in _parallel_build_trees
    tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 168, in _parallel_build_trees
    tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 168, in _parallel_build_trees
    tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 168, in _parallel_build_trees
    tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 168, in _parallel_build_trees
    tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  38 tasks      | elapsed:    1.5s
[Parallel(n_jobs=4)]: Done 121 out of 121 | elapsed:    1.9s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 121 out of 121 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done 121 out of 121 | elapsed:    0.7s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 121 out of 121 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done 121 out of 121 | elapsed:    0.6s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 121 out of 121 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done 121 out of 121 | elapsed:    0.6s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 121 out of 121 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done 121 out of 121 | elapsed:    0.5s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 121 out of 121 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 170, in _parallel_build_trees
    tree.fit(X, y, sample_weight=sample_weight, check_input=False)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 386, in fit
    trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 1042, in __call__
    self.retrieve()
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\tiffa\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\tiffa\anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
building tree 1 of 144
building tree 2 of 144
building tree 3 of 144
building tree 4 of 144
building tree 5 of 144
building tree 6 of 144
building tree 7 of 144
building tree 8 of 144
building tree 9 of 144
building tree 10 of 144
building tree 11 of 144
building tree 12 of 144
building tree 13 of 144
building tree 14 of 144
building tree 15 of 144
building tree 16 of 144
building tree 17 of 144
building tree 18 of 144
building tree 19 of 144
building tree 20 of 144
building tree 21 of 144
building tree 22 of 144
building tree 23 of 144
building tree 24 of 144
building tree 25 of 144
building tree 26 of 144
building tree 27 of 144
building tree 28 of 144
building tree 29 of 144
building tree 30 of 144
building tree 31 of 144
building tree 32 of 144
building tree 33 of 144
building tree 34 of 144
building tree 35 of 144
building tree 36 of 144
building tree 37 of 144
building tree 38 of 144
building tree 39 of 144
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    2.1s
building tree 40 of 144
building tree 41 of 144building tree 42 of 144

building tree 43 of 144
building tree 44 of 144
building tree 45 of 144
building tree 46 of 144
building tree 47 of 144
building tree 48 of 144
building tree 49 of 144
building tree 50 of 144
building tree 51 of 144
building tree 52 of 144
building tree 53 of 144
building tree 54 of 144
building tree 55 of 144
building tree 56 of 144
building tree 57 of 144
building tree 58 of 144
building tree 59 of 144
building tree 60 of 144
building tree 61 of 144
building tree 62 of 144
building tree 63 of 144
building tree 64 of 144
building tree 65 of 144
building tree 66 of 144
building tree 67 of 144
building tree 68 of 144
building tree 69 of 144
building tree 70 of 144
building tree 71 of 144
building tree 72 of 144
building tree 73 of 144
building tree 74 of 144
building tree 75 of 144
building tree 76 of 144
building tree 77 of 144
building tree 78 of 144
building tree 79 of 144
building tree 80 of 144
building tree 81 of 144
building tree 82 of 144
building tree 83 of 144
building tree 84 of 144
building tree 85 of 144
building tree 86 of 144
building tree 87 of 144
building tree 88 of 144
building tree 89 of 144
building tree 90 of 144
building tree 91 of 144
building tree 92 of 144
building tree 93 of 144
building tree 94 of 144
building tree 95 of 144
building tree 96 of 144
building tree 97 of 144
building tree 98 of 144
building tree 99 of 144
building tree 100 of 144
building tree 101 of 144
building tree 102 of 144
building tree 103 of 144
building tree 104 of 144
building tree 105 of 144
building tree 106 of 144
building tree 107 of 144
building tree 108 of 144
building tree 109 of 144
building tree 110 of 144
building tree 111 of 144
building tree 112 of 144
building tree 113 of 144
building tree 114 of 144
building tree 115 of 144
building tree 116 of 144
building tree 117 of 144
building tree 118 of 144
building tree 119 of 144
building tree 120 of 144
building tree 121 of 144
building tree 122 of 144
building tree 123 of 144
building tree 124 of 144
building tree 125 of 144
building tree 126 of 144
building tree 127 of 144
building tree 128 of 144
building tree 129 of 144
building tree 130 of 144
building tree 131 of 144
building tree 132 of 144
building tree 133 of 144
building tree 134 of 144
building tree 135 of 144
building tree 136 of 144
building tree 137 of 144
building tree 138 of 144
building tree 139 of 144building tree 140 of 144

building tree 141 of 144
building tree 142 of 144
building tree 143 of 144
building tree 144 of 144
{'bootstrap': False, 'criterion': 'gini', 'max_depth': None, 'max_leaf_nodes': None, 'min_samples_leaf': 1, 'n_estimators': 144}
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    8.5s finished

5th Random Forest Model

In [663]:
forest2 = RandomForestClassifier(criterion= 'gini', verbose=2,n_jobs=4,n_estimators = 144, min_samples_leaf = 1, min_impurity_decrease = 0.0) 
forest2_stem_tfidf = forest2.fit(X_train_stem_tfidf, y_train_stem_tfidf)
# random forest performance through cross vaidation 
print (forest2_stem_tfidf)
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
building tree 1 of 144building tree 2 of 144

building tree 3 of 144
building tree 4 of 144
building tree 5 of 144
building tree 6 of 144
building tree 7 of 144
building tree 8 of 144
building tree 9 of 144
building tree 10 of 144
building tree 11 of 144
building tree 12 of 144
building tree 13 of 144
building tree 14 of 144
building tree 15 of 144
building tree 16 of 144
building tree 17 of 144
building tree 18 of 144
building tree 19 of 144
building tree 20 of 144
building tree 21 of 144
building tree 22 of 144
building tree 23 of 144
building tree 24 of 144
building tree 25 of 144
building tree 26 of 144
building tree 27 of 144
building tree 28 of 144
building tree 29 of 144
building tree 30 of 144
building tree 31 of 144
building tree 32 of 144
building tree 33 of 144
building tree 34 of 144
building tree 35 of 144
building tree 36 of 144
building tree 37 of 144
building tree 38 of 144
building tree 39 of 144
building tree 40 of 144
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    1.3s
building tree 41 of 144
building tree 42 of 144
building tree 43 of 144
building tree 44 of 144
building tree 45 of 144
building tree 46 of 144
building tree 47 of 144
building tree 48 of 144
building tree 49 of 144
building tree 50 of 144
building tree 51 of 144
building tree 52 of 144
building tree 53 of 144
building tree 54 of 144
building tree 55 of 144
building tree 56 of 144
building tree 57 of 144
building tree 58 of 144
building tree 59 of 144
building tree 60 of 144
building tree 61 of 144
building tree 62 of 144
building tree 63 of 144
building tree 64 of 144
building tree 65 of 144
building tree 66 of 144
building tree 67 of 144
building tree 68 of 144
building tree 69 of 144
building tree 70 of 144
building tree 71 of 144
building tree 72 of 144
building tree 73 of 144
building tree 74 of 144
building tree 75 of 144
building tree 76 of 144
building tree 77 of 144
building tree 78 of 144
building tree 79 of 144
building tree 80 of 144
building tree 81 of 144
building tree 82 of 144
building tree 83 of 144
building tree 84 of 144
building tree 85 of 144
building tree 86 of 144
building tree 87 of 144
building tree 88 of 144
building tree 89 of 144
building tree 90 of 144
building tree 91 of 144
building tree 92 of 144
building tree 93 of 144building tree 94 of 144

building tree 95 of 144
building tree 96 of 144
building tree 97 of 144
building tree 98 of 144
building tree 99 of 144
building tree 100 of 144
building tree 101 of 144
building tree 102 of 144
building tree 103 of 144
building tree 104 of 144
building tree 105 of 144
building tree 106 of 144
building tree 107 of 144
building tree 108 of 144
building tree 109 of 144
building tree 110 of 144
building tree 111 of 144
building tree 112 of 144
building tree 113 of 144
building tree 114 of 144
building tree 115 of 144
building tree 116 of 144
building tree 117 of 144
building tree 118 of 144
building tree 119 of 144
building tree 120 of 144
building tree 121 of 144
building tree 122 of 144
building tree 123 of 144
building tree 124 of 144
building tree 125 of 144
building tree 126 of 144
building tree 127 of 144
building tree 128 of 144
building tree 129 of 144
building tree 130 of 144
building tree 131 of 144
building tree 132 of 144
building tree 133 of 144
building tree 134 of 144
building tree 135 of 144
building tree 136 of 144
building tree 137 of 144
building tree 138 of 144
building tree 139 of 144
building tree 140 of 144
building tree 141 of 144
building tree 142 of 144
building tree 143 of 144
building tree 144 of 144
RandomForestClassifier(n_estimators=144, n_jobs=4, verbose=2)
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    5.8s finished
In [664]:
forest2_stem_tfidf_train = np.mean(cross_val_score(forest2_stem_tfidf,X_train_stem_tfidf,y_train_stem_tfidf,cv=10))
forest2_stem_tfidf_train
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:   14.0s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:   17.7s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    2.2s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    5.1s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    2.2s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    4.9s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    2.0s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    4.6s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    2.0s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    4.6s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    2.1s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    5.0s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    2.0s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    4.6s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.9s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    4.5s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    2.2s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    5.1s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    0.0s finished
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  58 tasks      | elapsed:    1.9s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    4.5s finished
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    0.0s finished
Out[664]:
0.7604390243902439
In [665]:
forest2_stem_tfidf_test = forest2_stem_tfidf.score(X_test_stem_tfidf, y_test_stem_tfidf)
forest2_stem_tfidf_test
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 144 out of 144 | elapsed:    0.0s finished
Out[665]:
0.7777777777777778

Even after RandomSearchCV, the model did not improve. The reason for this could be because it randomly samples documents from the training data in the cross validation process. It is unlike Gridsearch that uses up the entire training dataset. It is possible that it could be missing important information from the random selection process and this in turn hinders the performance on the test data. Despite this, the accuracies for both training through cross validation and test data is very close to the other previous Random Forest models that we tried.

After running the same random forest model on all combinations (stemming, lemmatization, CountVectorizer, TF-IDF) or pre-processing models, there was not much difference between all the results in the training cross-validation models. All resulted in approximately 76-77% accuracy.

I also attempted to tune the parameters. Increasing the number of trees in the random forest model did not make much of a difference. However, increasing max_depth led to an increase in accuracy, but the improvement stopped after a max_depth of 50. Leaving the max_depth open-ended to "none" resulted in the same accuracy as when the max_depth = 50.

Despite the insignificant changes in accuracy between the random forest models, lemmatization/CountVectorizer performed the best on the training data, but the improvement was only minuscule to the other models with different pre-processing and vertorizer techniques.

Only random forest has been attempted, other model types are better suited for sentiment analysis classification problems, such as: support vector machine (SVM), Naive Bayes, and neural networks.

Please see below for more models in an attempt to improve prediction accuracy.

Decision Tree

Setting the criterion to 'gini' (the default) had slightly better results than 'entropy', but the difference was not much. Increasing the max_depth to 10 yielded a decent result of 70-72% accuracy for both training and testing data. Setting the max_depth higher than 10 did not improve the model and the results were just stagnant.

In [562]:
#Decision Tree
from sklearn.tree import DecisionTreeClassifier

dt = DecisionTreeClassifier(criterion = 'gini', max_depth = 10, random_state=1)

dt_stem_vec = dt.fit(X_train_stem_vec, y_train_stem_vec)
In [563]:
dt_stem_vec_train = np.mean(cross_val_score(dt_stem_vec,X_train_stem_vec,y_train_stem_vec,cv=10))
dt_stem_vec_train
Out[563]:
0.6987696265243902
In [564]:
#Test this decision tree model (Stemming-CountVectorizer) on the test data
dt_stem_vec_test = dt_stem_vec.score(X_test_stem_vec, y_test_stem_vec)
dt_stem_vec_test
Out[564]:
0.7158469945355191

Interestingly, Decision tree has a tendency to overfit to the training data but such was not the case with this sentiment analysis data. Nonetheless, the results were not the highest despite tuning the parameters.

RandomSearch CV for Decision Tree Model:

In [378]:
#RandomisedSearch CV:
dt_random_param_dist = {"max_depth": [1, 3, None],
              "max_leaf_nodes": [1, None],
              "min_samples_leaf": sp_randint(1, 3),
              "criterion": ["gini", "entropy"]}

# run randomized search
samples = 20  # number of random samples 
dtrandomCV = RandomizedSearchCV(dt, param_distributions=dt_random_param_dist, n_iter=samples) #default cv = 3


dtrandomCV.fit(X_train_stem_vec, y_train_stem_vec)

 
print(dtrandomCV.best_params_)
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:548: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 890, in fit
    super().fit(
  File "C:\Users\tiffa\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 284, in fit
    raise ValueError(("max_leaf_nodes {0} must be either None "
ValueError: max_leaf_nodes 1 must be either None or larger than 1

  warnings.warn("Estimator fit failed. The score on this train-test"
{'criterion': 'gini', 'max_depth': None, 'max_leaf_nodes': None, 'min_samples_leaf': 2}

Decision Tree 2nd Model:

I am changing the parameters to what was found in RandomSearch CV.

In [380]:
#Decision Tree
from sklearn.tree import DecisionTreeClassifier

dt2 = DecisionTreeClassifier(criterion = 'gini', max_depth = None, random_state=1, min_samples_leaf=2)

dt2_stem_vec = dt2.fit(X_train_stem_vec, y_train_stem_vec)
In [381]:
print (np.mean(cross_val_score(dt2_stem_vec,X_train_stem_vec,y_train_stem_vec,cv=10)))
0.6928186928353658
In [382]:
#Test this model (Stemming-CountVectorizer) on the test data
dt2_stem_vec_test = dt2_stem_vec.score(X_test_stem_vec, y_test_stem_vec)
dt2_stem_vec_test
Out[382]:
0.7040072859744991

The results from the RandomSearch CV turned out to be worse. The reason for this could be because RandomSearch CV ramdonly samples from the training dataset for its cross validation, and doesn't employ the entire dataset like GridSearchCV. Usually, RandomsearchCV has better results than GridSearch CV because it randomly chooses documents for its cross validation, and you don't need to specify a range within the parameters like GridSearch.

Nonetheless, the Decision Tree model is not the best model for this data.

3rd Decision Tree model:

In [516]:
#Decision Tree for Stemming-TFIDF
from sklearn.tree import DecisionTreeClassifier

dt3 = DecisionTreeClassifier(criterion = 'gini', max_depth = None, random_state=1)

dt3_stem_tfidf = dt3.fit(X_train_stem_tfidf, y_train_stem_tfidf)
In [517]:
#Stemming-TFIDF train
dt3_stem_tfidf_train = np.mean(cross_val_score(dt3_stem_tfidf,X_train_stem_tfidf,y_train_stem_tfidf,cv=10))
dt3_stem_tfidf_train
Out[517]:
0.6760322027439024

Using TF-IDF for Decision Tree is not as good. The accuracy plummets to 67% for the train data, and would probably reflect that as well in testing. Despite this, there is no overfitting which is good.

Bagging

I changed the n_estimators to see if there would be a difference. I tried 50, 100 and 150. There seemed to be no difference in accuracy when I changed the n_estimators. So, I left it at 50 to save more time.

In [343]:
#Bagging

from sklearn.ensemble import BaggingClassifier

bgcl = BaggingClassifier(base_estimator=dt, n_estimators=50,random_state=1)
#bgcl = BaggingClassifier(n_estimators=50,random_state=1)

bag_stem_vec = bgcl.fit(X_train_stem_vec, y_train_stem_vec)
In [539]:
bag_stem_vec_train = np.mean(cross_val_score(bag_stem_vec,X_train_stem_vec,y_train_stem_vec,cv=10))
bag_stem_vec_train
Out[539]:
0.7586834984756098
In [540]:
#Test this model (Stemming-CountVectorizer) on the test data
bag_stem_vec_test = bag_stem_vec.score(X_test_stem_vec, y_test_stem_vec)
bag_stem_vec_test
Out[540]:
0.773224043715847

Compared to all the other types of models, bagging is not the strongest for this dataset. The model fares well nonetheless.

Adaboost Model

Adaboost proved to be pretty strong compared to all the other model types, yielding a 79% accuracy. It comes second to logistic regression. For this reason, I put more effort in running more of these type of models in hope of improving the accuracy score while making sure of not overfitting.

1st Adaboost model:

For this model, I tuned the hyperparameters by changing the number of n_estimators. 150 yielded the best outcome. I chose to leave the base estimator to its default which is decision tree classifier with a depth of 1. I tried improving the model by increasing the depth to 3, but the performance is more or less the same.

In [536]:
from sklearn.ensemble import AdaBoostClassifier
abcl = AdaBoostClassifier(n_estimators=150, random_state=1)        #Base estimator = Decision Tree, with a depth of 1
abcl_stem_vec = abcl.fit(X_train_stem_vec, y_train_stem_vec)
In [537]:
abcl_stem_vec_train = np.mean(cross_val_score(abcl_stem_vec,X_train_stem_vec,y_train_stem_vec,cv=10))
abcl_stem_vec_train
Out[537]:
0.7652206554878049
In [ ]:
 

2nd Adaboost model:

Changed depth to 3, the performance stayed the same. So, I changed the criterion from gini to entropy and the performance still remained the same.

In [334]:
from sklearn.ensemble import AdaBoostClassifier

# This is for the base_estimator for the Adaboost model.
dTreeR = DecisionTreeClassifier(criterion = 'entropy', max_depth = 3, random_state=1)

#Trying out different hyperparameters for Adaboost model
abcl2 = AdaBoostClassifier(base_estimator= dTreeR, n_estimators=150, random_state=1)        #Base estimator = dTreeR, with a depth of 3, and criterion to entropy
abcl2_stem_vec = abcl.fit(X_train_stem_vec, y_train_stem_vec)
In [335]:
print (np.mean(cross_val_score(abcl2_stem_vec,X_train_stem_vec,y_train_stem_vec,cv=10)))
0.7693205030487805
In [672]:
#Test this Adaboost model (Stemming-CountVectorizer) on the test data
abcl2_stem_vec_test = abcl2_stem_vec.score(X_test_stem_vec, y_test_stem_vec)
abcl2_stem_vec_test
Out[672]:
0.7903005464480874

It seems that for the Adaboost mode, changing the n_estimators changes the performance results. This is the best model and I decided to test the model on the test data.

3rd Adaboost model:

Instead of using CountVectorizer, I used TF-IDF data for this 3rd model. Performance worsened a little. In order to avoid data leak, I will not test this model on the test data.

In [519]:
from sklearn.ensemble import AdaBoostClassifier
abcl = AdaBoostClassifier(n_estimators=150, random_state=1)        #Base estimator = dTree, with a depth of 1
abcl_stem_tfidf = abcl.fit(X_train_stem_tfidf, y_train_stem_tfidf)
In [520]:
abcl_stem_tfidf_train = np.mean(cross_val_score(abcl_stem_tfidf,X_train_stem_tfidf,y_train_stem_tfidf,cv=10))
abcl_stem_tfidf_train
Out[520]:
0.7540013338414634
In [ ]:
 
In [ ]:
 
In [ ]:
 

Gradient Boost Model

Gradient Boost performed decently, but it doesn't measure up to the simple logistic regression model for this data.

In [533]:
from sklearn.ensemble import GradientBoostingClassifier
gbcl = GradientBoostingClassifier(n_estimators = 100,random_state=1)
gbcl_stem_vec = gbcl.fit(X_train_stem_vec, y_train_stem_vec)
In [534]:
gbcl_stem_vec_train = np.mean(cross_val_score(gbcl_stem_vec,X_train_stem_vec,y_train_stem_vec,cv=10))
gbcl_stem_vec_train
Out[534]:
0.7569270198170732
In [535]:
#Test this model (Stemming-CountVectorizer) on the test data
gbcl_stem_vec_test = gbcl_stem_vec.score(X_test_stem_vec, y_test_stem_vec)
gbcl_stem_vec_test
Out[535]:
0.7666211293260473

K Nearest Neighbors:

K Nearest Neighbors is one of the weakest models among all the ones tried here. It may not be the best suited for sentiment analysis. I attempted 2 models, one with CountVectorized data and the other with TF-IDF data. the first had accuracy scores around 56-57% after k-fold validation on the training set. So, this would not be the model I would ultimately choose.

In [293]:
#KNN

from sklearn.neighbors import KNeighborsClassifier

knc = KNeighborsClassifier()
knc_stem_vec = knc.fit(X_train_stem_vec, y_train_stem_vec)
print(knc_stem_vec)
KNeighborsClassifier()
In [294]:
print (np.mean(cross_val_score(knc_stem_vec,X_train_stem_vec,y_train_stem_vec,cv=10)))
0.5658677591463415

To prevent data leak, we will not employ this model on test data.

2nd KNN model:

In [522]:
#KNN

from sklearn.neighbors import KNeighborsClassifier

knc = KNeighborsClassifier()
knc_stem_tfidf = knc.fit(X_train_stem_tfidf, y_train_stem_tfidf)
print(knc_stem_tfidf)
KNeighborsClassifier()
In [523]:
knc_stem_tfidf_train = np.mean(cross_val_score(knc_stem_tfidf,X_train_stem_tfidf,y_train_stem_tfidf,cv=10))
knc_stem_tfidf_train
Out[523]:
0.720433974847561
In [524]:
#Test this model (Stemming-CountVectorizer) on the test data
knc_stem_tfidf_test = knc_stem_tfidf.score(X_test_stem_tfidf, y_test_stem_tfidf)
knc_stem_tfidf_test
Out[524]:
0.7249544626593807

Unlike a lot of other models, for some reason, the Stemming-TF-IDF data works better with K Nearest Neighbors than does the Stemming-CountVectorizer data. This is very interesting. The opposite holds true for a lot of the other models.

Naive Bayes

Naive Bayes is usually a good model for natural language processing classification. Two models were attempted: one using Multinomial Naive Bayes and the other using Gaussian Naive Bayes.

In [259]:
#Naive Bayes
#Since CountVectorizer yields discrete values from the text, we need to use MultinomialNB()

from sklearn.naive_bayes import MultinomialNB
mnb = MultinomialNB()
mnb_stem_vec= mnb.fit(X_train_stem_vec, y_train_stem_vec)
print(mnb_stem_vec)
MultinomialNB()
In [529]:
mnb_stem_vec_train = np.mean(cross_val_score(mnb_stem_vec,X_train_stem_vec,y_train_stem_vec,cv=10))
mnb_stem_vec_train
Out[529]:
0.7729304496951219
In [264]:
#Test this model (Stemming-CountVectorizer) on the test data
mnb_stem_vec_test = mnb_stem_vec.score(X_test_stem_vec, y_test_stem_vec)
mnb_stem_vec_test
Out[264]:
0.7802823315118397

2nd Naive Bayes Model:

In [525]:
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
gnb_stem_tfidf= gnb.fit(X_train_stem_tfidf, y_train_stem_tfidf)
print(gnb_stem_tfidf)
GaussianNB()
In [527]:
gnb_stem_tfidf_train = np.mean(cross_val_score(gnb_stem_tfidf,X_train_stem_tfidf,y_train_stem_tfidf,cv=10))
gnb_stem_tfidf_train
Out[527]:
0.3585080030487805
In [528]:
#Test this model (lemmatization-CountVectorizer) on the test data
gnb_stem_tfidf_test = gnb_stem_tfidf.score(X_test_stem_tfidf, y_test_stem_tfidf)
gnb_stem_tfidf_test
Out[528]:
0.3738615664845173

Naive Bayes is considered one of the prominent Machine Learning algorithms for natural language processing. It is used to discern inbox emails from spam. Stemming and lemmatization did not show much difference when it came to Naive Bayes, but CountVectorizer (77-78%) proved to yield a higher accuracy compared to TF-IDF Vectorizer (around 36-37% accuracy). I used GaussianNB with the TF-IDF data because the features are normalized from -1 to 1, and MultinomialNB with the CountVectorizer data because of the 0 and 1 discrete values. MultinomialNB is the one that achieved a 77-78% accuracy.

3rd Naive Bayes Model: Just to see if there is a difference if data is lemmatized.

In [530]:
from sklearn.naive_bayes import MultinomialNB
mnb = MultinomialNB()
mnb_lemma_vec= mnb.fit(X_train_lemma_vec, y_train_lemma_vec)
print(mnb_lemma_vec)
MultinomialNB()
In [531]:
mnb_lemma_lemma_train = np.mean(cross_val_score(mnb_lemma_vec,X_train_lemma_vec,y_train_lemma_vec,cv=10))
mnb_lemma_lemma_train
Out[531]:
0.7729304496951219

Nope, there seems to be no difference after k-fold cross validation.

Support Vector Machine (SVM) Model 1:

In [571]:
from sklearn import svm

# Perform classification with SVM, kernel=linear
svm = svm.SVC(kernel='linear')
svm_stem_vec= svm.fit(X_train_stem_vec, y_train_stem_vec)
print(svm_stem_vec)
SVC(kernel='linear')
In [572]:
svm_stem_vec_train = np.mean(cross_val_score(svm_stem_vec,X_train_stem_vec,y_train_stem_vec,cv=10))
svm_stem_vec_train
Out[572]:
0.7812255144817073

This accuracy is on par with that of Random Forest after tuning the hyperparameters. There is a possibility Support Vector Machine could work very well if we tune the hyperparameters.

Support Vector Machine Model 2:

Let's see if Support Vector machine works better with the TF-IDF data.

In [574]:
svm_stem_tfidf= svm.fit(X_train_stem_tfidf, y_train_stem_tfidf)
print(svm_stem_tfidf)
SVC(kernel='linear')
In [575]:
svm_stem_tfidf_train = np.mean(cross_val_score(svm_stem_tfidf,X_train_stem_tfidf,y_train_stem_tfidf,cv=10))
svm_stem_tfidf_train
Out[575]:
0.7905923208841463

Surely, Support Vector Machine works better on TF-IDF. We see an improvement, even if it is just a little.

Support Vector Machine Model 3:

Let's look at a model with lemmatization to see if that makes a difference to the performance.

In [577]:
svm_lemma_tfidf= svm.fit(X_train_lemma_tfidf, y_train_lemma_tfidf)
print(svm_lemma_tfidf)
SVC(kernel='linear')
In [578]:
svm_lemma_tfidf_train = np.mean(cross_val_score(svm_lemma_tfidf,X_train_lemma_tfidf,y_train_lemma_tfidf,cv=10))
svm_lemma_tfidf_train
Out[578]:
0.7905923208841463

After performing cross-validation on the lemmatized training data, the accuracy remained the same as the stemming training data.

Support Vector Machine Model 4:

Perhaps, changing the kernel can yield higher performance than the linear kernel. We know that TF-IDF data functions better with the support vector machine model that the CountVectrizer data. So, we should stick with evaluating the TF-IDF data.

In [582]:
# Perform classification with SVM, kernel=rbf
from sklearn import svm
from sklearn.svm import SVC

svm_rbf = svm.SVC(kernel='rbf')
svm_rbf_stem_tfidf= svm_rbf.fit(X_train_stem_tfidf, y_train_stem_tfidf)
print(svm_rbf_stem_tfidf)
SVC()
In [583]:
svm_rbf_stem_tfidf_train = np.mean(cross_val_score(svm_rbf_stem_tfidf,X_train_stem_tfidf,y_train_stem_tfidf,cv=10))
svm_rbf_stem_tfidf_train
Out[583]:
0.7949842797256098

Applying model on the test data after evaluating all SVM models on training datasets.

In [593]:
svm_rbf_stem_tfidf_test = svm_rbf_stem_tfidf.score(X_test_stem_tfidf, y_test_stem_tfidf)
svm_rbf_stem_tfidf_test
Out[593]:
0.8098816029143898

This is by far the best model. Performing cross-validation on the training data showed the highest accuracy, which means the model achieved the highest fraction of true positive, true neural and true negative over all tweets. After evaluating each model on the training data sets using cross-validation, I decided finally to employ this model (after all my SVM models) on the test data to prevent data leak. As expected, the performance on the test data scored a 81% accuracy, which is excellent.

In [598]:
svm_rbf_stem_tfidf_pred = svm_rbf_stem_tfidf.predict(X_test_stem_tfidf)
In [599]:
conf_mat_svm_tfidf = confusion_matrix(y_test_stem_tfidf, svm_rbf_stem_tfidf_pred)

print(conf_mat_svm_tfidf)
[[2646  127   41]
 [ 361  479   44]
 [ 175   87  432]]

Accuracy: 81%

Negative: Precision: 2646/(2646 + 361 + 175) = 0.83 (83%) Recall: 2646/(2646 + 127 + 41) = 0.94 (94%)

Neutral: Precision: 479/(127 + 479 + 87) = 0.69 (69%) Recall: 479/(361 + 479 + 44) = 0.54 (54%)

Positive: Precision: 432/(41 + 44 + 432) = 0.84 (84%) Recall: 432/(175 + 87 + 432) = 0.62 (62%)

In [ ]:
 

Support Vector Machine Model 5:

In [595]:
svm_rbf = svm.SVC(kernel='rbf')
svm_rbf_stem_vec= svm_rbf.fit(X_train_stem_vec, y_train_stem_vec)
print(svm_rbf_stem_vec)
SVC()
In [596]:
svm_rbf_stem_vec_train = np.mean(cross_val_score(svm_rbf_stem_vec,X_train_stem_vec,y_train_stem_vec,cv=10))
svm_rbf_stem_vec_train
Out[596]:
0.7893255525914634
In [597]:
svm_rbf_stem_vec_test = svm_rbf_stem_vec.score(X_test_stem_vec, y_test_stem_vec)
svm_rbf_stem_vec_test
Out[597]:
0.811247723132969
In [607]:
svm_rbf_stem_vec_pred = svm_rbf_stem_vec.predict(X_test_stem_vec)
In [608]:
conf_mat_svm_vec = confusion_matrix(y_test_stem_vec, svm_rbf_stem_vec_pred)

print(conf_mat_svm_vec)
[[2630  137   47]
 [ 332  508   44]
 [ 169  100  425]]

Accuracy: 81%

Negative: Precision: 2630/(2630 + 332 + 169) = 0.84 (84%) Recall: 2630/(2630 + 137 + 47) = 0.93 (93%)

Neutral: Precision: 508/(137 + 508 + 100) = 0.68 (68%) Recall: 508/(332 + 508 + 44) = 0.57 (57%)

Positive: Precision: 425/(47 + 44 + 425) = 0.82 (82%) Recall: 425/(169 + 100 + 425) = 0.61 (61%)

Support Vector Machine Model 6:

I thought that because logistic regression performed so well with this text data, perhaps changing the kernel to sigmoid may be beneficial because logistic regression has the sigmoid shape.

In [585]:
# Perform classification with SVM, kernel=sigmoid
from sklearn import svm
from sklearn.svm import SVC

svm_sigmoid = svm.SVC(kernel='sigmoid')
svm_sigmoid_stem_tfidf= svm_sigmoid.fit(X_train_stem_tfidf, y_train_stem_tfidf)
print(svm_sigmoid_stem_tfidf)
SVC(kernel='sigmoid')
In [586]:
svm_sigmoid_stem_tfidf_train = np.mean(cross_val_score(svm_sigmoid_stem_tfidf,X_train_stem_tfidf,y_train_stem_tfidf,cv=10))
svm_sigmoid_stem_tfidf_train
Out[586]:
0.7879575076219513

Although performance is pretty good with sigmoid as the kernel, it falls a little short compared to the rbf kernel model. I suspect its performance on the test set will reflect the same.

Support Vector Machine Model 7:

To improve the model, I made effort to tune the hyperparameters like gamma. Instead of 'auto' I tried 'scale'.

In [588]:
svm_rbf_scale = svm.SVC(kernel='rbf', gamma='scale')
svm_rbf_scale_stem_tfidf= svm_rbf_scale.fit(X_train_stem_tfidf, y_train_stem_tfidf)
print(svm_rbf_scale_stem_tfidf)
SVC()
In [589]:
svm_rbf_scale_stem_tfidf_train = np.mean(cross_val_score(svm_rbf_scale_stem_tfidf,X_train_stem_tfidf,y_train_stem_tfidf,cv=10))
svm_rbf_scale_stem_tfidf_train
Out[589]:
0.7949842797256098

After changing the gamma from 'auto' to 'scale', there was no improvement to the model in the cross validation process. For this reason, I do not believe there will be any difference when applying the model on the test data.

Support Vector Machine Model 8:

In an attempt to improve the model I decreased C. I understand that overfitting may be an issue but it is worth the try.

In [591]:
svm_rbf_c8 = svm.SVC(kernel='rbf', gamma='auto', C=0.8)
svm_rbf_c8_stem_tfidf= svm_rbf_c8.fit(X_train_stem_tfidf, y_train_stem_tfidf)
print(svm_rbf_c8_stem_tfidf)
SVC(C=0.8, gamma='auto')
In [592]:
svm_rbf_c8_stem_tfidf_train = np.mean(cross_val_score(svm_rbf_c8_stem_tfidf,X_train_stem_tfidf,y_train_stem_tfidf,cv=10))
svm_rbf_c8_stem_tfidf_train
Out[592]:
0.6209993330792682

The accuracy went down by a lot. Decreasing C may not be the solution.

In [673]:
 
Out[673]:
4
In [ ]: